A Text Classification Model via Multi-Level Semantic Features

نویسندگان

چکیده

Text classification is a major task of NLP (Natural Language Processing) and has been the focus attention for years. News as branch text characterized by complex structure, large amounts information long length, which in turn leads to decrease accuracy classification. To improve Chinese news texts, we present model based on multi-level semantic features. First, add category correlation coefficient TF-IDF (Term Frequency-Inverse Document Frequency) frequency concentration CHI (Chi-Square), extract keyword features with improved algorithm. Then, local TextCNN symmetric-channel global from BiLSTM attention. Finally, fuse three prediction categories. The results experiments THUCNews, LTNews MCNews show that our presented method highly accurate, 98.01%, 90.95% 94.24% accuracy, respectively. With parameters two magnitudes smaller than Bert, improvements relative baseline Bert+FC are 1.27%, 1.2%, 2.81%,

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Boosting for Text Classification with Semantic Features

Current text classification systems typically use term stems for representing document content. Ontologies allow the usage of features on a higher semantic level than single words for text classification purposes. In this paper we propose such an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting, a successful machine learning tec...

متن کامل

A Saliency Detection Model via Fusing Extracted Low-level and High-level Features from an Image

Saliency regions attract more human’s attention than other regions in an image. Low- level and high-level features are utilized in saliency region detection. Low-level features contain primitive information such as color or texture while high-level features usually consider visual systems. Recently, some salient region detection methods have been proposed based on only low-level features or hig...

متن کامل

Multi-class Animacy Classification with Semantic Features

Animacy is the semantic property of nouns denoting whether an entity can act, or is perceived as acting, of its own will. This property is marked grammatically in various languages, albeit rarely in English. It has recently been highlighted as a relevant property for NLP applications such as parsing and anaphora resolution. In order for animacy to be used in conjunction with other semantic feat...

متن کامل

Improving Multi-Document Summarization via Text Classification

Developed so far, multi-document summarization has reached its bottleneck due to the lack of sufficient training data and diverse categories of documents. Text classification just makes up for these deficiencies. In this paper, we propose a novel summarization system called TCSum, which leverages plentiful text classification data to improve the performance of multi-document summarization. TCSu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Symmetry

سال: 2022

ISSN: ['0865-4824', '2226-1877']

DOI: https://doi.org/10.3390/sym14091938